search overhead
On Hash-Based Work Distribution Methods for Parallel Best-First Search
Parallel best-first search algorithms such as Hash Distributed A* (HDA*) distribute work among the processes using a global hash function. We analyze the search and communication overheads of state-of-the-art hash-based parallel best-first search algorithms, and show that although Zobrist hashing, the standard hash function used by HDA*, achieves good load balance for many domains, it incurs significant communication overhead since almost all generated nodes are transferred to a different processor than their parents. We propose Abstract Zobrist hashing, a new work distribution method for parallel search which, instead of computing a hash value based on the raw features of a state, uses a feature projection function to generate a set of abstract features which results in a higher locality, resulting in reduced communications overhead. We show that Abstract Zobrist hashing outperforms previous methods on search domains using hand-coded, domain specific feature projection functions. We then propose GRAZHDA*, a graph-partitioning based approach to automatically generating feature projection functions. GRAZHDA* seeks to approximate the partitioning of the actual search space graph by partitioning the domain transition graph, an abstraction of the state space graph. We show that GRAZHDA* outperforms previous methods on domain-independent planning.
Abstract Zobrist Hashing: An Efficient Work Distribution Method for Parallel Best-First Search
Jinnai, Yuu (The University of Tokyo) | Fukunaga, Alex (The University of Tokyo)
Hash Distributed A* (HDA*) is an efficient parallel best first algorithm that asynchronously distributes work among the processes using a global hash function. Although Zobrist hashing, the standard hash function used by HDA*, achieves good load balance for many domains, it incurs significant communication overhead since it requires many node transfers among threads. We propose Abstract Zobrist hashing, a new work distribution method for parallel search which reduces node transfers and mitigates communication overhead by using feature projection functions. We evaluate Abstract Zobrist hashing for multicore HDA*, and show that it significantly outperforms previous work distribution methods.
Evaluation of a Simple, Scalable, Parallel Best-First Search Strategy
Kishimoto, Akihiro, Fukunaga, Alex, Botea, Adi
Large-scale, parallel clusters composed of commodity processors are increasingly available, enabling the use of vast processing capabilities and distributed RAM to solve hard search problems. We investigate Hash-Distributed A* (HDA*), a simple approach to parallel best-first search that asynchronously distributes and schedules work among processors based on a hash function of the search state. We use this approach to parallelize the A* algorithm in an optimal sequential version of the Fast Downward planner, as well as a 24-puzzle solver. The scaling behavior of HDA* is evaluated experimentally on a shared memory, multicore machine with 8 cores, a cluster of commodity machines using up to 64 cores, and large-scale high-performance clusters, using up to 2400 processors. We show that this approach scales well, allowing the effective utilization of large amounts of distributed memory to optimally solve problems which require terabytes of RAM. We also compare HDA* to Transposition-table Driven Scheduling (TDS), a hash-based parallelization of IDA*, and show that, in planning, HDA* significantly outperforms TDS. A simple hybrid which combines HDA* and TDS to exploit strengths of both algorithms is proposed and evaluated.
Evaluations of Hash Distributed A* in Optimal Sequence Alignment
Kobayashi, Yoshikazu (Tokyo Institute of Technology) | Kishimoto, Akihiro (Tokyo Institute of Technology) | Watanabe, Osamu (Tokyo Institute of Technology)
Hash Distributed A* (HDA*) is a parallel A* algorithm that is proven to be effective in optimal sequential planning with unit edge costs. HDA* leverages the Zobrist function to almost uniformly distribute and schedule work among processors. This paper evaluates the performance of HDA* in optimal sequence alignment. We observe that with a large number of CPU cores HDA* suffers from an increase of search overhead caused by reexpansions of states in the closed list due to nonuniform edge costs in this domain. We therefore present a new work distribution strategy limiting processors to distribute work, thus increasing the possibility of detecting such duplicate search effort. We evaluate the performance of this approach on a cluster of multi-core machines and show that the approach scales well up to 384 CPU cores.
On the Scaling Behavior of HDA*
Kishimoto, Akihiro (Tokyo Institute of Technology and JST PRESTO) | Fukunaga, Alex (University of Tokyo) | Botea, Adi (NICTA and The Australian National University)
HDA* is a simple, parallelization of A* where work is asynchronously distributed among the nodes by a global hash function. Using up to 1024 cores on a large distributed memory cluster, we evaluate HDA* for a domain-independent planner as well an application-specific 24-puzzle solver. We show that HDA* scales fairly well on a large cluster using up to 1024 cores. Our analysis of the scaling behavior shows that on a cluster of multicore nodes, using only a subset of the available cores and leaving some cores idle can, surprisingly, lead to better results.
Scalable, Parallel Best-First Search for Optimal Sequential Planning
Kishimoto, Akihiro (Tokyo Institute of Technology and JST PRESTO) | Fukunaga, Alex (Tokyo Institute of Technology) | Botea, Adi (NICTA and The Australian National University)
Large-scale, parallel clusters composed of commodity processors are increasingly available, enabling the use of vast processing capabilities and distributed RAM to solve hard search problems. We investigate parallel algorithms for optimal sequential planning, with an emphasis on exploiting distributed memory computing clusters. In particular, we focus on an approach which distributes and schedules work among processors based on a hash function of the search state. We use this approach to parallelize the A* algorithm in the optimal sequential version of the Fast Downward planner. The scaling behavior of the algorithm is evaluated experimentally on clusters using up to 128 processors, a significant increase compared to previous work in parallelizing planners. We show that this approach scales well, allowing us to effectively utilize the large amount of distributed memory to optimally solve problems which require hundreds of gigabytes of RAM to solve. We also show that this approach scales well for a single, shared-memory multicore machine.